17 research outputs found

    Improving cluster recovery with feature rescaling factors

    Get PDF
    The data preprocessing stage is crucial in clustering. Features may describe entities using different scales. To rectify this, one usually applies feature normalisation aiming at rescaling features so that none of them overpowers the others in the objective function of the selected clustering algorithm. In this paper, we argue that the rescaling procedure should not treat all features identically. Instead, it should favour the features that are more meaningful for clustering. With this in mind, we introduce a feature rescaling method that takes into account the within-cluster degree of relevance of each feature. Our comprehensive simulation study, carried out on real and synthetic data, with and without noise features, clearly demonstrates that clustering methods that use the proposed data normalization strategy clearly outperform those that use traditional data normalization

    Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification

    Get PDF
    With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes

    History on the biological nitrogen fixation research in graminaceous plants: special emphasis on the Brazilian experience

    Full text link

    The Impact of Imbalanced training Data on Local matching learning of ontologie

    Get PDF
    International audienceMatching learning corresponds to the combination of ontology matching and machine learning techniques. This strategy has gained increasing attention in recent years. However, state-of-the-art approaches implementing matching learning strategies are not well-tailored to deal with imbalanced training sets. In this paper, we address the problem of the imbalanced training sets and their impacts on the performance of the matching learning in the context of aligning biomedical ontologies. Our approach is applied to local matching learning, which is a technique used to divide a large ontology matching task into a set of distinct local sub-matching tasks. A local matching task is based on a local classifier built using its balanced local training set. Thus, local classifiers discover the alignment of the local sub-matching tasks. To validate our approach, we propose an experimental study to analyze the impact of applying conventional resampling techniques on the quality of the local matching learning
    corecore